Variant Discovery ◾ 123
#6- Sorting and indexing BAM files
#Sorting BAM files
mkdir sortedbam
cd bam
for i in $(ls *.bam);
do
samtools sort -T ../sortedbam/tmp.sort -o ../sortedbam/${i}
${i}
done
cd ..
#Indexing the sorted BAM files
cd sortedbam
for i in $(ls *.bam);
do
samtools index ${i}
done
cd ..
#7- Marking/removing duplicate alignments
mkdir dupRemBam
cd sortedbam
for i in $(ls *.bam);
do
samtools rmdup ${i} ../dupRemBam/${i} 2> ../dupRemBam/${i}.log
done;
cd ..
#8- Alignment pileup and variant calling using bcftools
mkdir variants
cd dupRemBam
for i in $(ls *.bam);
do
samtools index ${i}
done;
ls *.bam | rev | rev > bam_list.txt
bcftools mpileup -Ou \
-f ../ref/GCF_009858895.2_ASM985889v3_genomic.fna \
-b bam_list.txt \
| bcftools call -vmO z \
-o ../variants/sarscov2.vcf.gz
cd ..
#9- Index the VCF file
tabix variants/sarscov2.vcf.gz
#to view contents
bcftools view variants/sarscov2.vcf.gz | less -S
The resulted VCF file, which contains the genotypes of all samples (Figure 4.4), can be
further analyzed as we will discuss at the end of this chapter. However, before analysis, the
identified variants must be filtered.